2024-08-21 09:46:13.AIbase.11.2k
Llama3 Compressed Version! Nvidia Releases Small Language Model Llama-3.1-Minitron4B with Only 400 Million Parameters
Nvidia's research team has successfully launched Llama-3.1-Minitron4B using model pruning and distillation techniques. This is a compressed version of the Llama3 model, aimed at implementing artificial intelligence on devices. The model reduces the parameter count of the original 8B model through deep and width pruning techniques while maintaining performance close to larger models. Despite a significant reduction in training data (by 40 times), the model achieved a 16% performance improvement on the MMLU benchmark.